Early prediction of an intention of a user&#39;s actions

ABSTRACT

A computer-implemented method includes recording, with a three-dimensional camera, one or more demonstrations of a user performing one or more reaching tasks. Training data is computed to describe the one or more demonstrations. One or more weights of a neural network are learned based on the training data, where the neural network is configured to estimate a goal location of the one or more reaching tasks. A partial trajectory of a new reaching task is recorded. An estimated goal location is computed, by a computer processor, by applying the neural network to the partial trajectory of the new reaching task.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims priority to U.S. Provisional ApplicationNo. 62/366,663, filed on Jul. 26, 2016, the contents of which areincorporated by reference herein in their entirety.

BACKGROUND

Embodiments of the present invention relate to robotics and, morespecifically, to early prediction of an intention of a user's actions.

Human intention inference is a natural step in achieving safety inhuman-robot collaboration. With intention inference, the robot can haveknowledge of how the user will likely move. As a result, the robot canplan its own movements accordingly, so as not to collide with the userand so as not to perform redundant actions. Studies in psychology showthat when two humans interact with each other, each one infers theintended actions of the other and decides based on this inference whatproactive actions could be taken for safe interaction and collaboration.Thus, to enable robots to work more effectively with humans, improvedhuman intention inference can be helpful.

SUMMARY

According to an embodiment of this disclosure, a computer-implementedmethod includes recording, with a three-dimensional camera, one or moredemonstrations of a user performing one or more reaching tasks. Trainingdata is computed to describe the one or more demonstrations. One or moreweights of a neural network are learned based on the training data,where the neural network is configured to estimate a goal location ofthe one or more reaching tasks. A partial trajectory of a new reachingtask is recorded. An estimated goal location is computed, by a computerprocessor, by applying the neural network to the partial trajectory ofthe new reaching task.

In another embodiment, a system includes a memory having computerreadable instructions and one or more processors for executing thecomputer readable instructions. The computer readable instructionsinclude recording, with a three-dimensional camera, one or moredemonstrations of a user performing one or more reaching tasks. Furtheraccording to the computer readable instructions, training data iscomputed to describe the one or more demonstrations. One or more weightsof a neural network are learned based on the training data, where theneural network is configured to estimate a goal location of the one ormore reaching tasks. A partial trajectory of a new reaching task isrecorded. An estimated goal location is computed, by applying the neuralnetwork to the partial trajectory of the new reaching task.

In yet another embodiment, a computer program product for inferring anintention includes a computer readable storage medium having programinstructions embodied therewith. The program instructions are executableby a processor to cause the processor to perform a method. The methodincludes recording, with a three-dimensional camera, one or moredemonstrations of a user performing one or more reaching tasks. Furtheraccording to the method, training data is computed to describe the oneor more demonstrations. One or more weights of a neural network arelearned based on the training data, where the neural network isconfigured to estimate a goal location of the one or more reachingtasks. A partial trajectory of a new reaching task is recorded. Anestimated goal location is computed, by applying the neural network tothe partial trajectory of the new reaching task.

Additional features and advantages are realized through the techniquesof the present invention. Other embodiments and aspects of the inventionare described in detail herein and are considered a part of the claimedinvention. For a better understanding of the invention with theadvantages and the features, refer to the description and to thedrawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The subject matter regarded as the invention is particularly pointed outand distinctly claimed in the claims at the conclusion of thespecification. The foregoing and other features and advantages of theinvention are apparent from the following detailed description taken inconjunction with the accompanying drawings in which:

FIG. 1 is a block diagram of a learning system, according one or moreembodiments of this disclosure;

FIG. 2 is a flow diagram of a method for training a neural network ofthe learning system to infer intentions of reaching tasks, according toone or more embodiments of this disclosure;

FIG. 3 is a flow diagram of a method for inferring intentions ofreaching tasks, with online updating, according to one or moreembodiments of this disclosure;

FIG. 4 is another flow diagram of a method for inferring intentions ofreaching tasks, with online updating, according to one or moreembodiments of this disclosure; and

FIG. 5 is a block diagram of a computer system for implementing some orall aspects of the learning system, according to one or more embodimentsof this disclosure.

DETAILED DESCRIPTION

Embodiments of a learning system described herein infer the intention ofa user's movements, such as arm movements, based on observations from athree-dimensional (3D) camera. In some embodiments, the intention thatis inferred is a goal, or future, location of an arm reaching task in 3Dspace.

In some embodiments, the learning system models nonlinear motiondynamics of a user's arm using a nonlinear function, with intentionsrepresented as parameters. A model for the motion may be learned throughuse of a neural network (NN). This model may be a state-space model,where the NN may be used to represent state propagation. Joint positionsand velocities of a human skeletal structure of the user may be used asstates, while intentions may be parameters of the state-space model.

Based on the learned model, an approximate expectation-maximization(E-M) algorithm may be developed to infer human intentions. Morespecifically, by using a NN, intention inference may be solved as aparameter inference problem using an approximateexpectation-maximization (EM) algorithm. The intention inference problemmay handle three sources of uncertainty: uncertain system dynamics,sensor measurement noise, and unknown human intent. Further, in someembodiments, an identifier-based online model learning algorithm mayadapt to variations in the motion dynamics, motion trajectory, goallocations, and initial conditions of varying human users.

FIG. 1 is a block diagram of a learning system 100, according to one ormore embodiments of this disclosure. The learning system 100 maydetermine an intention (i.e., an intended goal location) of a user,which may be a human user, performing a reaching task in a 3D workspaceshared with a robot 120. Based on the intention, the learning system 100may instruct the robot 120 on how to interact with the user.

The learning system 100 may include a training unit 150 and an onlinelearning unit 160, both of which may be hardware, software, or acombination of hardware and software. Generally, the training unit 150may train a neural network (NN) 170 based on training data derived fromone or more user demonstrations of reaching tasks; and the onlinelearning unit 160 may predict intentions for new reaching tasks and mayalso update the NN 170 based on test data derived from these newreaching tasks.

In some embodiments, the learning system 100 may be implemented on acomputer system and may execute a training algorithm and an onlinelearning algorithm, executed respectively by the training unit 150 andthe online learning unit 160, as described below. Further, the learningsystem 100 may be in communication with a camera 180, such as MicrosoftKinect® for Windows® or some other camera capable of capturing data inthree dimensions, which may record demonstrations of reaching tasks aswell as new reaching tasks after initial training of the NN 170.

For the purpose of this disclosure, there may be a set of intentionsG={g₁, g₂, . . . , g₃}, where each g_(i) ∈

³ represents a goal location, which may be a location in 3D space of anobject in the workspace. Each reaching task performed may be associatedwith an intention g ∈ G, which may be a location at which the reachingtask culminates. For example, and not by way of limitation, G mayrepresent a set of finite locations on a table. Although it is assumedthroughout this disclosure that G includes a set of finite values, itwill be understood that some embodiments may be extended to cover acontinuous G.

In some embodiments, a state x_(t) may represent the various positionsand velocities of selected points on a user's arm. For instance, x_(t) ∈

²⁴ may represent the positions and velocities of four points on the arm,specifically the shoulder, elbow, wrist, and palm, where these pointsdescribe overall movement of the arm at a given time step t. Thevariable z_(t) ∈

²⁴ may represent measurements obtained from the camera 180 at the timestep t, where x_(t) may be derived based in part on z_(t), as will bedescribed further below. Evolution of the state x_(t) over time maydepend on both a previous value of the state (i.e., at a previous timestep) as well as an estimated intention g.

FIG. 2 is a flow diagram of a method 200 for training the NN 170 toinfer intentions of reaching tasks, according to some embodiments ofthis disclosure.

As shown, at block 205, the learning system 100 may record one or moredemonstrations of a user performing reaching tasks. In some embodiments,each demonstration may include culmination of the reaching task at acorresponding goal location. Thus, each such demonstration may beassociated with a corresponding goal location. As will be discussedfurther below, the NN 170 may be trained based on these demonstrationsof reaching actions.

At block 210, the learning system 100 obtain estimates of jointpositions, velocities, and accelerations for the one or moredemonstrations. Specifically, in some embodiments, the joint positions,velocities, and accelerations may be estimated by applying a Kalmanfilter to the data recorded at block 205, as described further below.Together, these estimates and the goal locations may be used as trainingdata for the NN 170.

In some embodiments, the joint positions may be recorded by the camera180 in block 205, and may thus be obtained in the camera's frame ofreference. Where p^(c)=(x^(c), y^(c), z^(c))^(T) is a point in thecamera's reference frame, and pr=(x^(r), y^(r), z^(r))^(T) is a point inthe robot's reference frame, the points p^(c) and p^(r) are related byp^(c)=R_(r) ^(c)p^(r)+T_(r) ^(c) (hereinafter “Formula 1”). In Formula1, R_(r) ^(c) ∈ SO(3) and T_(r) ^(c) ∈

³ are, respectively, a rotation matrix and a translation vector.

The camera 180 may measure 3D positions of the user's joints, which maybe used as raw positions. These raw positions may be input into a Kalmanfilter to obtain the velocity and acceleration estimates of the joints.

In some embodiments, a measurement model may be represented in thegeneric form y_(t)=h(x_(t))+v_(t), where measurement functionh(x_(t))=H_(x) _(t) +b; b=[[T_(r) ^(c)]^(T), [T_(r) ^(c)]^(T), [T_(r)^(c)]^(T), [T_(r) ^(c)]^(T), 0_(1×12)]^(T); H=diag{R_(r) ^(c), R_(r)^(c), . . . , R_(r) ^(c)} ∈

^(24×24) is a block diagonal matrix; and {v_(t)}˜N(0, Σ_(z)) ∈

²⁴ is a zero-mean Gaussian noise with a covariance matrix Σ∈

^(24×24). The measurement noise {v_(t)} may be assumed to be independentof the process noise {ω_(t)} defined in Formula 2, provided below.Further, the measurement model of the shifted measurement vectorz_(t)=y_(t)−b at time t may be given by z_(t)=Hx_(t)+v_(t).

At block 215, the learning system 100 may generate training data basedon the joint positions, velocities, and accelerations. For instance, inthe training data, the joint positions, velocities, and accelerationsfor each reaching task may be associated with the corresponding goallocation of that reaching task.

At block 220, the learning system may learn the NN 170, based on the oneor more demonstrations, specifically, based on the training data derivedfrom the demonstrations. More specifically, learning the NN 170 mayinclude learning the weights of the NN 170, as described in detailbelow.

In some embodiments, a state transition model describes evolution of thestate of the robot 120 over time. The state transition model may bedescribed by the equation {dot over (x)}=f_(c)*(x_(t), g)+ω_(t)(hereinafter “Formula 2”). In this equation, {ω_(t)}˜N(0, Q_(c)) ∈

²⁴ is a non-zero Gaussian random process with a covariance matrix Q_(c)∈

^(24×24); and f_(c)*(x_(t), g):

²⁴×

³→

²⁴ is assumed to be an analytical function.

The nonlinear function f_(c)*(x_(t), g), of Formula 2, may be modeledusing the NN 170 given by f_(c)*(x_(t), g)=W^(t)σ(U^(T)s_(t))+∈(s_(t)),where s_(t)=[[x_(t) ^(T), g^(T)], 1]^(T)) ∈

²⁸ is a vector input to the NN 170;

${\sigma\left( {U^{T}s_{t}} \right)} = \left\lbrack {\frac{1}{1 + {\exp\left( \left( {{- U^{T}}s_{t}} \right)_{1} \right)}},\frac{1}{1 + {\exp\left( \left( {{- U^{T}}s_{t}} \right)_{2} \right)}},\ldots\;,\left. \quad{\frac{1}{1 + {\exp\left( \left( {{- U^{T}}s_{t}} \right)_{t} \right)}},\ldots\;,\frac{1}{1 + {\exp\left( \left( {{- U^{T}}s_{t}} \right)_{n_{h}} \right)}}} \right\rbrack^{T}} \right.$is a vector-sigmoid activation function; and (U^(T)s_(t))_(i) is thei^(th) element of the vector (U^(T)s_(t)); U ∈

^(28×n) ^(h) W ∈

^(n) ^(h) ^(×24) are bounded constant weight matrices; ∈(s_(t)) ∈

²⁴ is a function reconstruction error that goes to zero after the NN 170is fully trained; and n_(h) ∈

⁺ is the number of neurons in a hidden layer of the NN 170.

The learning system 100 may train the NN using Bayesian regularization,a robust algorithm suitable for applications where training data isnoisy and the sample size is small, as may be the case in someembodiments. In some embodiments, the NN 170 may be trained usingBayesian regularization with the objective function J(U,W)=K_(α)E_(D)+K_(β)E_(W), where E_(D)=Σ_(i)(y_(i)−a_(i))² is a sum ofsquared errors; y_(i) is the target location; a_(i) is the NN's output;E_(W) is a sum of the squares of weights of the NN 170; and α and β areparameters of regularization that can be used, respectively to changeemphasis between reducing reconstruction errors and reducing weightsizes.

FIG. 3 is a flow diagram of a method 300 for inferring intentions ofreaching tasks, with online updating, according to some embodiments ofthis disclosure. In some embodiments, the method 300 of FIG. 3 may occurafter initial training of the NN 170 according to the method 200 of FIG.2.

At block 325, the learning system 100 may record new measurements of anew reaching task being performed by a user. The user may or may not bea user who demonstrated reaching tasks incorporated into the trainingdata. The new reaching task may be unfinished and may thus form only apartial trajectory toward the user's intended goal location. In someembodiments, these new measurements may be recorded by the camera 180.

At block 330, the learning system 100 may generate new data from therecorded measurements, and that new data may include estimates of jointpositions, velocities, and accelerations of the new reaching task.Again, the generation of this data may be performed through applicationof a Kalman filter, this time with respect to the new measurements.However, unlike the training data, this new data may exclude actual goallocations. This new data may be used as the value of Z_(T) in theformulas presented below, where Z_(T) refers to a set of measurementsfrom an initial time instance to a current time t.

At block 335, the learning system 100 may initialize various parametersand variables to be used in updating the NN 170 and in estimating anintention (i.e., an intended goal location) of the new reaching task.For example, and not by way of limitation, this may include initializing{circumflex over (x)}₀, {circumflex over (P)}₀, {circumflex over(x)}_(id) ₀ , and ĝ₀; defining the parameters μ₀, P₀, Q, and Σ_(z); anddefining the gains for the online update algorithm k, α, γ, β₁, Γ_(W),Γ_(U) _(x) , and Γ_(U) _(g) . These variables and parameters will bedescribed further below.

At block 340, an iterative loop begins for determining the intention gof the new reaching task.

The learning system 100 may iteratively infer an intention g of the newreaching task, as the new measurements become available. In other words,the intention may be generated repeatedly, thus becoming more refined.In some embodiments, the EM algorithm requires the state transitionmodel to be in discrete form. As such, the state transition modeldefined in Formula 2 may be discretized using first-order Eulerapproximation, yielding x_(t)=f(x_(t−1), g)+ω_(t)T_(s) (hereinafter“Formula 3”), where f(x_(t−1), g)=x_(t−1)+W^(t) _(σ(U) ^(T)S_(t−1))T_(s)and T_(s) is the sampling period.

To infer intention, the learning system 100 may aim to maximize theposterior probability of Z_(T), the new data, given the intention g,using a maximum-likelihood (ML) criterion, where Z_(T)=z_(1:T) is a setof observations from time t=1 to t=T, and where T can differ between thetraining data and the new data. In other words, Z_(T) refers to a set ofmeasurements from an initial time instance to the time T, while eachz_(t) refers to an isolated measurement at time t. Process noise of thediscretized system in Formula 3 may be given by Q=T_(s) ²Q_(c). Thelog-likelihood function of the intention g, l(g), may be l(g)=logp(Z_(T)|g) (hereinafter “Formula 4”), which can be obtained bymarginalizing the joint distribution l(g)=log∫p(X_(T), Z_(T)|g)dX_(T)(hereinafter “Formula 5”). In Formula 5, X_(T)=x_(1:T) may be acollective representation of states from time t=1 to t=T.

In some embodiments, the learning system 100 may use an approximate EMalgorithm with modifications for handling state transition modelstrained using the NN 170. Using the fact that E_(X) _(T) {log[p(Z_(T)|g)]|X_(T)ĝ_(t)}=log p(Z_(T)|g), the log-likelihood defined inFormula 4 may be decomposed aslog p(Z _(T) |g)=E _(X) _(T) {log [p(Z _(T) ,X _(T) |g)]|Z _(T) ĝ _(t)}−E _(X) _(T) {log [p(X _(T) |Z _(T) ,g)]|Z _(T) ĝ _(t)}log p(Z _(T) |g)=Q(g,ĝ _(t))−H(g,ĝ _(t))(hereinafter “Formula 6” and “Formula 7,” respectively).

In the above Formulas 6 and 7, where E_(X) _(T) is the expectationoperator; ĝ_(t) is an estimate of the intention g at time t; Q(g,ĝ_(t))=E_(X) _(T) {log [p(Z_(T), X_(T)|g)]|Z_(T)ĝ_(t)} is an expectedvalue of the complete data log-likelihood given the various measurementsand intentions of the training data; and H(g, ĝ_(t))=E_(X) _(T) {log[p(X_(T)|Z_(T), g)]|Z_(T)ĝ_(t)}. It can be shown using Jensen'sinequality that H(g, ĝ_(t))≤H(ĝ_(t), ĝ_(t). Thus, to iterativelyincrease the log-likelihood, it may be required to choose g such thatQ(g, ĝ_(t))≥Q(ĝ_(t), ĝ_(t)).

As described in more detail below, the learning system 100 may computethe auxiliary function Q(g, ĝ_(t)) given the observations Z_(T) and acurrent estimate of the intention ĝ_(t). The learning system 100 mayalso compute the next intention estimate ĝ_(t+1), after the currentintention estimate, by finding a value of g that maximizes Q(g, ĝ_(t)).

At block 345, the learning system 100 may read a current measurementz_(t) of the new reaching task.

At block 350, based on the current NN 170 and the previous intentionestimate ĝ_(t−1), the learning system 100 may compute {circumflex over(x)}_(t), {circumflex over (P)}_(t), and {circumflex over (P)}_(t,t−1),respectively a state estimate, the covariance of the state estimate, andthe cross covariance of the state estimates at times t and t−1. Thiscomputation may be performed by using an extended Kalman function (EKF).This may be performed by computing the auxiliary function Q(g, ĝ_(t))given the observations Z_(T) and a current estimate of the intentionĝ_(t).

To this end, the learning system 100 may evaluate the expectation of thecomplete data log-likelihood given by Q(g, ĝ_(t))=E_(X) _(T) {log[p(Z_(T), X_(T)|g)]|Z_(T)ĝ_(t)}=E_(X) _(T) {V₀+Σ_(t=1) ^(T)V_(t)(x_(t),X_(t−1), g)|Z_(T), ĝ_(t)} (hereinafter “Formula 8”). If {v_(t)} and{ω_(t)} are Gaussian, V₀ and V_(t)(x_(t), x_(t−t), g) may be given byV₀=log [p(x₀|g)]=log [p(x₀)]=const−½ log [|P₀|]−½(x₀−μ₀)^(T)P₀ ⁻¹(x₀−μ₀)and V_(t)(x_(t), x_(t−1), g)=log [p(z_(t)|x_(t))]+log [p(x_(t)|x_(t−1),g)] (hereinafter collectively “Formula 9”), where μ₀ and P₀ arerespectively the initial state mean and covariance; |˜|j is thedeterminant operator; log [p(z_(t)|x_(t))]=−½ log[|Σ_(z)|]−½{(z_(t)−h(x_(t)))^(T)Σ_(Z) ⁻¹(z_(t)−h(x_(t)))} and log[p(x_(t)|x_(t−1), g)]=−½ log [|Q|]−½{(x_(t)−f(x_(t−1),g))^(T)Q⁻¹(x_(t)−f(x_(t−1), g))} (hereinafter “Formula 10”).

In Formula 9, log [p(z_(t)|x_(t), g)] is replaced by log[p(z_(t)|x_(t))]. This is because, for the intention inference problem,the measurement z_(t) need not depend on the intention g. Whenattempting to optimize Formula 9, a difficulty arises due to thenonlinearity of the state transition model. Thus, in some embodiments,the nonlinear state transition model may be represented by the NN. Inorder to compute the expectation of the log-likelihood in Formula 10,the expression inside the curly brackets of Formula 10 may be linearizedabout x _(t) and x _(t−1) using the Taylor series expansion. Inpractice, the points of linearization {x _(t)} may be obtained from themeasurements of the joints by ignoring measurement noise and invertingh( ), the measurement function, which may be a simple affinetransformation invertible to obtain the points of linearization from themeasurements.

Where V _(t)=(x_(t)−f(x_(t−1), g))^(T)Q⁻¹(x_(t)−f(x_(t−1), g)), theTaylor series expansion of V _(t) may be

${\overset{\_}{V}}_{t} \approx {{{\overset{\_}{V}}_{t}\left( {{\overset{\_}{x}}_{t},{\overset{\_}{x}}_{t - 1}} \right)} + {\frac{\partial{{\overset{\_}{V}}_{t}\left( {{\overset{\_}{x}}_{t},{\overset{\_}{x}}_{t - 1},g} \right)}}{\partial x_{t}}\left\lbrack {x_{t} - {\overset{\_}{x}}_{t}} \right\rbrack} + {\frac{\partial{{\overset{\_}{V}}_{t}\left( {{\overset{\_}{x}}_{t},{\overset{\_}{x}}_{t - 1},g} \right)}}{\partial x_{t - 1}}\left\lbrack {x_{t - 1} - {\overset{\_}{x}}_{t - 1}} \right\rbrack} + {{\frac{1}{2}\left\lbrack {x_{t} - {\overset{\_}{x}}_{t}} \right\rbrack}^{T}{\frac{\partial^{2}{{\overset{\_}{V}}_{t}\left( {{\overset{\_}{x}}_{t},{\overset{\_}{x}}_{t - 1},g} \right)}}{{\partial x_{t}}{\partial x_{t}}}\left\lbrack {x_{t} - {\overset{\_}{x}}_{t}} \right\rbrack}} + {{\frac{1}{2}\left\lbrack {x_{t - 1} - {\overset{\_}{x}}_{t - 1}} \right\rbrack}^{T}{\frac{\partial^{2}{{\overset{\_}{V}}_{t}\left( {{\overset{\_}{x}}_{t},{\overset{\_}{x}}_{t - 1},g} \right)}}{{\partial x_{t - 1}}{\partial x_{t - 1}}}\left\lbrack {x_{t - 1} - {\overset{\_}{x}}_{t - 1}} \right\rbrack}} + {{\frac{1}{2}\left\lbrack {x_{t} - {\overset{\_}{x}}_{t}} \right\rbrack}^{T}{\frac{\partial^{2}{{\overset{\_}{V}}_{t}\left( {{\overset{\_}{x}}_{t},{\overset{\_}{x}}_{t - 1},g} \right)}}{{\partial x_{t}}{\partial x_{t - 1}}}\left\lbrack {x_{t - 1} - {\overset{\_}{x}}_{t - 1}} \right\rbrack}} + {\ldots\;{\left( {{hereinafter}\mspace{14mu}{``{{Formula}\mspace{14mu} 11}"}} \right).}}}$

The derivatives of V _(t) may be given by the following

$\mspace{20mu}{\frac{\partial{\overset{\_}{V}}_{t}}{\partial x_{t}} = {\left( {Q^{- 1} + Q^{- T}} \right)\left( {x_{t} - {f\left( {x_{t - 1},g} \right)}} \right)}}$$\mspace{20mu}{\frac{\partial{\overset{\_}{V}}_{t}}{\partial\left( x_{t - 1} \right)_{i}} = {\left\lbrack \frac{\partial{\overset{\_}{V}}_{t}}{\partial f} \right\rbrack^{T}\frac{\partial f}{\partial\left( x_{t - 1} \right)_{i}}}}$$\mspace{20mu}{\frac{\partial^{2}{\overset{\_}{V}}_{t}}{{\partial x_{t}}{\partial x_{t}}} = {Q^{- 1} + Q^{- T}}}$$\mspace{20mu}{\frac{\partial^{2}{\overset{\_}{V}}_{t}}{{\partial x_{t}}{\partial x_{t - 1}}} = {- {\left( {Q^{- 1} + Q^{- T}} \right)\left\lbrack \frac{\partial f}{\partial x_{t - 1}} \right\rbrack}}}$$\frac{\partial^{2}{\overset{\_}{V}}_{t}}{{\partial\left( x_{t - 1} \right)_{i}}{\partial\left( x_{t - 1} \right)_{j}}} = {{\left\lbrack \frac{\partial^{2}{\overset{\_}{V}}_{t}}{\partial{f\left( {\partial x_{t - 1}} \right)}_{i}} \right\rbrack^{T}\frac{\partial f}{\partial\left( x_{t - 1} \right)_{j}}} + {\frac{\partial^{2}f}{{\partial\left( x_{t - 1} \right)_{j}}{\partial\left( x_{t - 1} \right)_{i}}}\left\lbrack \frac{\partial{\overset{\_}{V}}_{t}}{\partial f} \right\rbrack}}$$\mspace{20mu}{\left( {{hereinafter}\mspace{14mu}{``{{{Formula}\mspace{14mu} 12},}"}\mspace{14mu}{``{{{Formula}\mspace{14mu} 13},}"}\mspace{14mu}{``{{{Formula}\mspace{14mu} 14},}"}\mspace{14mu}\mspace{20mu}{``{{{Formula}\mspace{14mu} 15},}"}\mspace{14mu}{and}\mspace{14mu}{``{{{Formula}\mspace{14mu} 16},}"}\mspace{14mu}{respectively}} \right),{{{where}\mspace{14mu}\mspace{20mu}\left\lbrack \frac{\partial{\overset{\_}{V}}_{t}}{\partial f} \right\rbrack} = {{{- {\left\lbrack {Q^{- 1} + Q^{- T}} \right\rbrack\left\lbrack {x_{t} - {f\left( {x_{t - 1},g} \right)}} \right\rbrack}}\mspace{14mu}{{and}\mspace{14mu}\mspace{20mu}\left\lbrack \frac{\partial^{2}{\overset{\_}{V}}_{t}}{\partial{f\left( {\partial x_{t - 1}} \right)}_{i}} \right\rbrack}} = {\left\lbrack {Q^{- 1} + Q^{- T}} \right\rbrack^{T}{\frac{\partial f}{\partial\left( x_{t - 1} \right)_{i}}.}}}}}$

It is noted that

$\frac{\partial f}{\partial x_{t - 1}}$is the sub-matrix of the Jacobian of the NN that can be obtained byignoring the matrix rows pertaining to

$\frac{\partial f}{\partial g}.$Thus, the Jacobian

$\frac{\partial f}{\partial x_{t}}$can be derived by taking the first n columns of

$\frac{\partial f}{\partial s},$where n is the number of states (i.e., the dimension of the state vectorx_(t)). The Hessian

$\frac{\partial^{2}f}{{\partial\left( x_{t} \right)}{\partial\left( x_{t} \right)}}$can be derived in a similar fashion.

Using Formulas 11 through 16, the expectation in Formula 8 can bewritten as

${Q\left( {g,{\hat{g}}_{t}} \right)} = {{{- \frac{1}{2}}{\log\left\lbrack {P_{0}} \right\rbrack}} - {\frac{1}{2}{tr}\left\{ {P_{0}\left( {{\hat{P}}_{0} + {\left( {{\hat{x}}_{0} - µ_{0}} \right)\left( {{\hat{x}}_{0} - µ_{0}} \right)^{T}}} \right)} \right\}} - {\frac{T}{2}{\log\left\lbrack {\sum\limits_{z}} \right\rbrack}} - {\frac{T}{2}{\log\left\lbrack {Q} \right\rbrack}} - {\frac{1}{2}{\sum\limits_{t - 1}^{T}{{tr}\left\{ {\sum\limits_{z}^{- 1}\left( {{\left\lbrack {z_{t} - {H\;{\hat{x}}_{t}}} \right\rbrack\left\lbrack {z_{t} - {H\;{\hat{x}}_{t}}} \right\rbrack}^{T} + {H\;{\hat{P}}_{t}H^{T}}} \right)} \right\}}}} - {\frac{1}{2}{\sum\limits_{t - 1}^{T}{{\overset{\_}{V}}_{t}\left( {{\overset{\_}{x}}_{t},{\overset{\_}{x}}_{t - 1},g} \right)}}} - {\frac{1}{2}{\sum\limits_{t - 1}^{T}\left\lbrack {\left\lbrack \frac{\partial{{\overset{\_}{V}}_{t}\left( {{\overset{\_}{x}}_{t},{\overset{\_}{x}}_{t - 1},g} \right)}}{\partial x_{t}} \right\rbrack^{T}\left\lbrack {{\hat{x}}_{t} - {\overset{\_}{x}}_{t}} \right\rbrack} \right\rbrack}} - {\frac{1}{2}{\sum\limits_{t - 1}^{T}\left\lbrack {\left\lbrack \frac{\partial{{\overset{\_}{V}}_{t}\left( {{\overset{\_}{x}}_{t},{\overset{\_}{x}}_{t - 1},g} \right)}}{\partial x_{t - 1}} \right\rbrack^{T}\left\lbrack {{\hat{x}}_{t - 1} - {\overset{\_}{x}}_{t - 1}} \right\rbrack} \right\rbrack}} - {\frac{1}{4}{tr}\left\{ {\frac{\partial^{2}{{\overset{\_}{V}}_{t}\left( {{\overset{\_}{x}}_{t},{\overset{\_}{x}}_{t - 1},g} \right)}}{{\partial x_{t}}{\partial x_{t}}}\left( {{\hat{P}}_{t} + {\left\lbrack {{\hat{x}}_{t} - {\overset{\_}{x}}_{t}} \right\rbrack\left\lbrack {{\hat{x}}_{t} - {\overset{\_}{x}}_{t}} \right\rbrack}^{T}} \right)} \right\}} - {\frac{1}{4}{tr}\left\{ {\frac{\partial^{2}{{\overset{\_}{V}}_{t}\left( {{\overset{\_}{x}}_{t},{\overset{\_}{x}}_{t - 1},g} \right)}}{{\partial x_{t - 1}}{\partial x_{t - 1}}}\left( {{\hat{P}}_{t - 1} + {\left\lbrack {{\hat{x}}_{t - 1} - {\overset{\_}{x}}_{t - 1}} \right\rbrack\left\lbrack {{\hat{x}}_{t - 1} - {\overset{\_}{x}}_{t -}} \right\rbrack}^{T}} \right)} \right\}} - {\frac{1}{4}{tr}\left\{ {\frac{\partial^{2}{{\overset{\_}{V}}_{t}\left( {{\overset{\_}{x}}_{t},{\overset{\_}{x}}_{t - 1},g} \right)}}{{\partial x_{t}}{\partial x_{t - 1}}}\left( {{\hat{P}}_{t,{t - 1}} + {\left\lbrack {{\hat{x}}_{t} - {\overset{\_}{x}}_{t}} \right\rbrack\left\lbrack {{\hat{x}}_{t - 1} - {\overset{\_}{x}}_{t - 1}} \right\rbrack}^{T}} \right)} \right\}} - {\ldots\mspace{11mu}{\left( {{hereinafter}\mspace{14mu}{``{{Formula}\mspace{14mu} 17}"}} \right).}}}$

In the above, {circumflex over (x)}_(t) and {circumflex over (P)}_(t)may be the state estimate and its covariance, respectively, while{circumflex over (x)}₀ and {circumflex over (P)}₀ may be theirrespective initial values; and {circumflex over (P)}_(t,t−1) may be thecross covariance of the state estimates at times t and t−1. The stateestimate {circumflex over (x)}_(t) and the covariances {circumflex over(P)}_(t) and {circumflex over (P)}_(t,t−1) may be obtained using an EKF.To linearize the transition model for the EKF at a current time t, thelearning system 100 may use the state estimate {circumflex over(x)}_(t−1) from a previous time as a point of linearization. Formula 17can be written in an iterative form to calculate the value of the Qfunction at every iteration. Thus, Formula 17 may be solved iteratively,with one iteration performed at each time step, such that an estimate ofthe goal location may be obtained at each time step.

At block 355, the learning system 100 may compute the intention estimateĝ_(t), such as by finding a value of g that maximizes Q(g, ĝ_(t)). Insome embodiments, the learning system 100 may optimize Q(g, ĝ_(t)) overg as described by ĝ_(t+1)=arg max_(g)Q(g, ĝ_(t)). This may be executedin various ways, such as by numerical optimization or direct evaluationas described below.

When using numerical evaluation of the Q function, Q(g, ĝ_(t)), thelearning system 100 may attempt to maximize the Q function through theuse of a GradEM algorithm, where the first few iterations of thealgorithm are used for this purpose. This technique may includeoptimizing the Q function over

³. An update equation for ĝ_(t), through GradEM algorithm, may be givenby ĝ_(k+1)=ĝ_(k)−

(Q)⁻¹Δ(Q) (hereinafter “Formula 18”), where ĝ_(k) may be an estimate ofg at the k^(th) iteration of the optimization algorithm

(Q) and Δ(Q) are respectively the Hessian and Gradient of the Qfunction.

In some embodiments, numerical optimization may be performed at eachtime step of the EM algorithm, where each time step is a point in timeat which the intention is reevaluated based on available new data forthe new reaching task. Further, in some embodiments, the number ofiterations for the optimization in Formula 18 may be chosen based oncomputational capabilities, and the Hessian of the Q function may benumerically approximated.

In contrast, when using direct evaluation of the Q function, thelearning system 100 may infer g by evaluating the Q function for allpossible g_(i) (i.e., all possible goal locations) in G, so as to obtainĝ_(t+1) as described by ĝ_(t+1)=arg max_(g∈G)Q(g, ĝ_(t)). This techniqueof evaluating the Q function may be executable if all possible goallocations g_(i) are known a priori and are finite. This need not anunusual case in the context of the described problem scenario. Forinstance, the learning system 100 may use the camera 180 along withimage processing algorithms to detect objects on the workbench, and toextract the 3D positions of those objects. The learning system 100 maythen assume that each reaching task has a goal location equivalent toone of these 3D positions.

At block 360, instructions may be given to the robot 120, based in parton the estimated intention ĝ_(t), so as to enable the robot 120 tointeract with the user performing the new reaching task. It will beunderstood that the robot's activities in interacting with the user mayvary widely, depending on the task being performed by the robot 120, theuser, or both. Thus, these instructions may be of various types,designed to perform these activities effectively. For example, and notby way of limitation, it may be determined that the estimated intentionis a location that is the same as a position of the particular object inthe workspace. In that case, the instructions to the robot 120 may avoidcollision with a trajectory of the user's arm moving toward thatparticular object.

At block 365, using the intention estimate ĝ_(t), computed at block 350,the learning system may compute a state identifier {circumflex over({dot over (x)})}_(id) _(t) , which may be used in updating the NN 170.

In some embodiments, the learning system 100 may continue to update theNN 170 in real time, while predictions of goal locations are being made,by way of an online learning algorithm. The online learning algorithmmay be used to update the weights of the NN 170. In some embodiments,this online learning of the NN weights may make inferences of g robustto variations in starting arm positions and various motion trajectoriestaken by different people. In other words, the learning system 100 maycontinue to generate good results regardless of which user is performinga reaching task, and regardless of an initial arm position of that user.

Through application of the online learning algorithm, the NN weights maybe updated iteratively as new data about the new reaching task becomesavailable. To this end, a state identifier may be used to compute anestimate of the state derivative based on current state estimatesobtained from the EKF and the current NN weights. The learning system100 may compute an error in the state identifier, based on the stateestimate and measurement of the state. The error may be used to updatethe NN weights at the next time step. The state identifier may use afeedback term such as the robust integral of the sign of the error(RISE) to obtain asymptotic convergence of the state estimates and theirderivatives to their true values.

Equations used to update the NN weights may use Lyapunov-based stabilityanalysis. For instance, the state identifier is given by {circumflexover ({dot over (x)})}_(idt)=Ŵ_(t) ^(T)σ(Û_(t) ^(T)ŝ_(t))+μ_(t)(hereinafter “Formula 19”), where Û_(t) ∈

^(28×n) ^(h) ; Ŵ_(t) Σ

^(n) ^(h) ^(×24); ŝ_(t)=[[{circumflex over (x)}_(idt) ^(T), ĝ_(t)],1]^(T) ∈

²⁸; ĝ_(t) ∈

³ is the current estimate of g from the EM algorithm; {circumflex over(x)}_(idt) ∈

²⁴ is the current identifier state; and μ_(t) ∈

²⁴ is the RISE feedback term defined as μ_(t)=k{tilde over (x)}₀v_(t),where {tilde over (x)}_(t)={circumflex over (x)}_(idt) is the stateidentification error at time t; and v_(t) ∈

²⁴ is the Filippov generalized solution to the differential equation{dot over (v)}_(t)=(kα+γ){tilde over (x)}_(t)+β₁sgn({tilde over(x)}_(t)), for v₀=0, where k, α, γ, β₁ ∈

⁺ are positive constant control gains; and sgn( ) denotes a vectorsignum function.

The learning system 100 may use weight update equations given by{circumflex over ({dot over (W)})}_(t)=proj(Ŵ_(t), Γ_(w){circumflex over(σ)}′Û_(x) _(t) ^(T){circumflex over ({dot over (x)})}_(idt){tilde over(x)}_(t) ^(T)); {circumflex over ({dot over (U)})}_(x) _(t) =proj(Û_(x)_(t) , Γ_(u) _(x) {circumflex over ({dot over (x)})}_(idt){tilde over(x)}_(t) ^(T)Ŵ_(t) ^(T){circumflex over (σ)}′); and {circumflex over({dot over (U)})}_(g) _(t) =proj(Û_(g) _(t) , Γ_(u) _(g) {circumflexover (ġ)}_(t){tilde over (x)}_(t) ^(T)Ŵ_(t) ^(T){circumflex over (σ)}′)(hereinafter collectively “Formula 20”), where Û_(x) _(t) and Û_(g) _(t)are submatrices of Û_(t) formed by taking the rows corresponding to{circumflex over (x)}_(idt) and ĝ_(t) respectively; {circumflex over(σ)}′ is the first-order derivative of the sigmoid function with respectto its inputs; and Γ_(w), Γ_(u) _(x) , and Γ_(u) _(g) are constantweighting matrices of appropriate dimensions;

${{proj}\left( {\hat{\theta},\phi} \right)} = \left\{ \begin{matrix}{\phi\left\{ \begin{matrix}{{{if}\mspace{14mu}\underset{\_}{\theta}} \leq \hat{\theta} \leq {\overset{\_}{\theta}\mspace{14mu}{or}}} \\{\hat{\theta} > {\overset{\_}{\theta}\mspace{14mu}{and}\mspace{14mu}{\phi(t)}} \leq {0\mspace{14mu}{or}}} \\{\hat{\theta} < {\underset{\_}{\theta}\mspace{14mu}{and}\mspace{14mu}{\phi(t)}} \geq 0}\end{matrix} \right.} \\{{\overset{\_}{\phi}\mspace{14mu}{if}\mspace{14mu}\hat{\theta}} > {\overset{\_}{\theta}\mspace{14mu}{and}\mspace{14mu}{\phi(t)}} > 0} \\{{\overset{\Cup}{\phi}\mspace{14mu}{if}\mspace{14mu}\hat{\theta}} < {\underset{\_}{\theta}\mspace{14mu}{and}\mspace{14mu}{\phi(t)}} < 0}\end{matrix} \right.$is a projection operator where

$\overset{\_}{\phi}\overset{\Delta}{=}{{\left\lbrack {1 + \frac{\overset{\_}{\theta} - \hat{\theta}}{\delta}} \right\rbrack\phi\mspace{14mu}{and}\mspace{14mu}\overset{\Cup}{\phi}}\overset{\Delta}{=}{\left\lbrack {1 + \frac{\hat{\theta} - \underset{\_}{\theta}}{\delta}} \right\rbrack{\phi.}}}$In some embodiments, the projection operator ensures that the estimate{circumflex over (θ)} ∈ Ω_(δ) ∀t≥0, where Ω_(δ)={{circumflex over(θ)}|θ−δ≤{circumflex over (θ)}≤θ+δ}. For this online learning algorithm,the learning system 100 may use ĝ_(t) from the EM algorithm. Thus, forthe purpose of the online learning, the learning system 100 may assumethat ĝ_(t) is a known parameter. The variable {circumflex over (ġ)}_(t),the derivative of the intention estimate, may be computed using thefinite difference method. It can be shown that, in some embodiments, thestate identifier defined in Formula 19 along with the update equationsdefined in Formula 20 are asymptotically stable.

At block 370, the learning system 100 may update the NN 170 by changingthe weights, such as according to Formula 20, which appears above. Insome embodiments, the method 300 may then return to block 340 to begincalculation of a new estimated intention ĝ_(t) at a new time step wherethe time step t is incremented. This iterative loop may continue as longas an estimated intention is desired for the new reaching task.

FIG. 4 is another flow diagram of a method 400 for inferring intentionsof reaching tasks, with online updating, according to some embodimentsof this disclosure. In some embodiments, the method 400 of FIG. 4 mayoccur after initial training of the NN 170 according to the method 200of FIG. 2, and thus this method 400 may be used as an alternative to themethod 300 of FIG. 3.

At block 425, the learning system 100 may record new measurements of anew reaching task being performed by a user. At block 430, the learningsystem 100 may generate new data from the recorded measurements, andthat new data may include estimates of joint positions, velocities, andaccelerations of the new reaching task. At block 435, the learningsystem 100 may identify the number of goal locations and create a modelfor each goal location. At block 440, the learning system 100 mayinitialize one or more parameters and variables used to estimateintentions. At block 445, an iterative loop may begin for determining anintention of a new reaching task. At block 450, the learning system 100may read a current measurement of the new reaching task. At block 455,the learning system may compute one or more mixing probabilities and maymix state estimates from previous iterations of the loop or, if this isthe first iteration, from default parameters. At block 460, the learningsystem 100 may compute state estimates and covariances of the stateestimates, and may model likelihoods based on values of the stateestimates and covariances from previous iterations or, if this is thefirst iteration, from default parameters. At block 465, the learningsystem 100 may compute model probabilities and an estimate for theintention.

At block 470, the learning system 100 may instruct the robot 120 basedon the intention estimate. In some embodiments, the method 400 may thenreturn to block 445 to begin calculation of a new estimated intentionĝ_(t) at a new time step where the time step t is incremented. Thisiterative loop may continue as long as an estimated intention is desiredfor the new reaching task. Below follows a more detailed description ofthis method 400.

Given a state variable x ∈

^(n), a set of D demonstrations {D_(i)}_(i=1) ^(D) are solutions to anunderlying dynamics model governed by the following first-orderdifferential equation {dot over (x)}(t)=f(x(t))+w(t) (hereinafter“Formula 21”), where f:

^(n)→

^(n) is a nonlinear, continuous, and continuously differentiablefunction, and w˜

(0, Q_(c)) is a zero-mean Gaussian process with covariance Q_(c). Eachdemonstration may be associated with trajectories of a state{x(t)}_(t=0) ^(t=T) and trajectories of the state derivative {{dot over(x)}(t)}_(t=0) ^(t=T) from time t=0 to t=T. The nonlinear function f maybe modeled using an artificial neural network (ANN), specificallyf(x(t))=W^(T)σ(U^(T)s(t))+∈(s(t)). In this ANN description,s(t)=[x(t)^(T), 1]^(T) ∈

^(n+1) is the input vector to the ANN;

${\sigma\left( {U^{T}{s(t)}} \right)} = \left\lbrack {\frac{1}{1 + {\exp\left( {- \left( {U^{T}{s(t)}} \right)_{1}} \right)}},\ldots\mspace{11mu},\frac{1}{1 + {\exp\left( {- \left( {U^{T}{s(t)}} \right)_{i}} \right)}},\ldots\;,\left. \quad\frac{1}{1 + {\exp\left( {- \left( {U^{T}{s(t)}} \right)_{n^{h}}} \right)}} \right\rbrack^{T}} \right.$is a vector-sigmoid activation function; (U^(T)s(t))_(i) is the i^(th)element of the vector (U^(T)s(t)); U ∈

^(n+1×n) ^(h) and W ∈

^(n) ^(h) ^(×n) are bounded, constant-weight matrices; ∈(s(t)) ∈

^(n) is a function reconstruction error that goes to zero after the ANNis fully trained; and n_(h) is the number of hidden layers of the ANN.

The learning system 100 may solve the following unconstrainedoptimization problem to train the ANN: {Ŵ, Û}=argmin_(W,U){αE_(D)+βE_(w)} (hereinafter “Formula 22”). In the above,E_(D)=Σ_(i=1) ^(D)[y_(i)−a_(i)]^(T) [y_(i)−a_(i)] is the sum of thesquared error; y_(i) ∈

^(n) and a_(i) ∈

^(n) respectively represent the target and the network's output of thei^(th) demonstration; E_(w) is the sum of the squares of the ANNweights; α, β ∈

are scalar parameters of regularization.

Technical effects and benefits of one or more embodiments include theability to model human arm motion dynamics using a NN 170 and to inferhuman intentions using a NN-based approximate EM algorithm. Technicaleffects and benefits may further include the ability to use onlinelearning to adapt to various initial arm configurations and to varioususers.

FIG. 5 illustrates a block diagram of a computer system 500 for use inimplementing a learning system 100 or method according to someembodiments. The learning systems 100 and methods described herein maybe implemented in hardware, software (e.g., firmware), or a combinationthereof. In some embodiments, the methods described may be implemented,at least in part, in hardware and may be part of the microprocessor of aspecial or general-purpose computer system 500, such as a personalcomputer, workstation, minicomputer, or mainframe computer.

In some embodiments, as shown in FIG. 5, the computer system 500includes a processor 505, memory 510 coupled to a memory controller 515,and one or more input devices 545 and/or output devices 540, such asperipherals, that are communicatively coupled via a local I/O controller535. These devices 540 and 545 may include, for example, a printer, ascanner, a microphone, and the like. Input devices such as aconventional keyboard 550 and mouse 555 may be coupled to the I/Ocontroller 535. The I/O controller 535 may be, for example, one or morebuses or other wired or wireless connections, as are known in the art.The I/O controller 535 may have additional elements, which are omittedfor simplicity, such as controllers, buffers (caches), drivers,repeaters, and receivers, to enable communications.

The I/O devices 540, 545 may further include devices that communicateboth inputs and outputs, for instance disk and tape storage, a networkinterface card (NIC) or modulator/demodulator (for accessing otherfiles, devices, systems, or a network), a radio frequency (RF) or othertransceiver, a telephonic interface, a bridge, a router, and the like.

The processor 505 is a hardware device for executing hardwareinstructions or software, particularly those stored in memory 510. Theprocessor 505 may be a custom made or commercially available processor,a central processing unit (CPU), an auxiliary processor among severalprocessors associated with the computer system 500, a semiconductorbased microprocessor (in the form of a microchip or chip set), amicroprocessor, or other device for executing instructions. Theprocessor 505 includes a cache 570, which may include, but is notlimited to, an instruction cache to speed up executable instructionfetch, a data cache to speed up data fetch and store, and a translationlookaside buffer (TLB) used to speed up virtual-to-physical addresstranslation for both executable instructions and data. The cache 570 maybe organized as a hierarchy of more cache levels (L1, L2, etc.).

The memory 510 may include one or combinations of volatile memoryelements (e.g., random access memory, RAM, such as DRAM, SRAM, SDRAM,etc.) and nonvolatile memory elements (e.g., ROM, erasable programmableread only memory (EPROM), electronically erasable programmable read onlymemory (EEPROM), programmable read only memory (PROM), tape, compactdisc read only memory (CD-ROM), disk, diskette, cartridge, cassette orthe like, etc.). Moreover, the memory 510 may incorporate electronic,magnetic, optical, or other types of storage media. Note that the memory510 may have a distributed architecture, where various components aresituated remote from one another but may be accessed by the processor505.

The instructions in memory 510 may include one or more separateprograms, each of which comprises an ordered listing of executableinstructions for implementing logical functions. In the example of FIG.5, the instructions in the memory 510 include a suitable operatingsystem (OS) 511. The operating system 511 essentially may control theexecution of other computer programs and provides scheduling,input-output control, file and data management, memory management, andcommunication control and related services.

Additional data, including, for example, instructions for the processor505 or other retrievable information, may be stored in storage 520,which may be a storage device such as a hard disk drive or solid-statedrive. The stored instructions in memory 510 or in storage 520 mayinclude those enabling the processor to execute one or more aspects ofthe learning systems 100 and methods of this disclosure.

The computer system 500 may further include a display controller 525coupled to a display 530. In some embodiments, the computer system 500may further include a network interface 560 for coupling to a network565. The network 565 may be an IP-based network for communicationbetween the computer system 500 and an external server, client and thelike via a broadband connection. The network 565 transmits and receivesdata between the computer system 500 and external systems. In someembodiments, the network 565 may be a managed IP network administered bya service provider. The network 565 may be implemented in a wirelessfashion, e.g., using wireless protocols and technologies, such as WiFi,WiMax, etc. The network 565 may also be a packet-switched network suchas a local area network, wide area network, metropolitan area network,the Internet, or other similar type of network environment. The network565 may be a fixed wireless network, a wireless local area network(LAN), a wireless wide area network (WAN) a personal area network (PAN),a virtual private network (VPN), intranet or other suitable networksystem and may include equipment for receiving and transmitting signals.

Learning systems 100 and methods according to this disclosure may beembodied, in whole or in part, in computer program products or incomputer systems 500, such as that illustrated in FIG. 5.

What is claimed is:
 1. A computer-implemented method of training a learning system to predict intentions of reaching tasks of an arm of a user, wherein the learning system controls a robot to avoid collisions and redundant actions with the person, wherein the learning system including a training unit and an on-line learning unit, wherein the training unit trains a neural network (NN) based on training data derived from one or more user demonstrations of reaching tasks; and wherein the online learning unit predicts intentions for new reaching tasks and updates the NN in real time based on test data derived from the reaching tasks, and the method comprising the learning system: recording, with a camera of the learning system, one or more demonstrations of the user to capture data z_(t) in three-dimensions, wherein each demonstration represents the user performing one or more reaching tasks via their arm, and wherein the captured data z_(t) represents positions in three dimensions of arm joints of the user, including shoulder, elbow, wrist, and palm, to describe overall movement of the arm at a time step t, wherein the arm joint positions are recorded in a camera reference frame corresponding to a point in a reference frame of the robot; deriving, from the measurements obtained from the captured data z_(t) at the time step t, a state x_(t), representing estimated positions, velocities and accelerations of the arm joints, wherein the state x_(t) depends on both a previous value of the state, at a previous time step t, and an estimated reaching intention g for the arm joints of the user that define a new reaching task, and wherein the arm joint positions, velocities, and accelerations are estimated by applying a Kalman filter to the recorded data; associating each of the one or more demonstrations with corresponding recorded goal locations defined by a culmination of the reaching task, and training the NN based on the one or more demonstrations; computing training data describing the one or more demonstrations from the estimates of the joint positions and the recorded goal locations; learning one or more weights of the neural network based on the computed training data, the neural network being configured to derive an estimated goal location of the one or more reaching tasks, wherein the one or more weights make inferences that estimate the reaching intention g based on starting arm positions and motion trajectories; recording a partial trajectory of a new reaching task; and computing, by a computer processor, the estimated goal location, defining the estimated reaching intention g, of the new reaching task by applying the neural network to the partial trajectory of the new reaching task; wherein computing the estimated goal location comprises solving a parameter inference problem by way of an expectation-maximization (EM) algorithm, wherein the EM algorithm utilizes a nonlinear state transition model, representing a positional state of the robot over time, that is discretized using first-order Euler approximation, and wherein the learning system computes an error in the state x_(t), based on the state estimate and measurement of the state, and wherein the error is utilized to update the NN weights at a next time step.
 2. The computer-implemented method of claim 1, further comprising executing an online learning algorithm to update the one or more weights of the neural network based at least in part on the estimated goal location of the new reaching task.
 3. The computer-implemented method of claim 2, wherein the online learning algorithm is configured to adapt the neural network to varying initial positions of an arm performing one or more future reaching tasks.
 4. The computer-implemented method of claim 1, wherein computing the estimated goal location comprises iteratively calculating the estimated goal location based on the neural network and a previous calculation of the estimated goal location.
 5. The computer-implemented method of claim 1, further comprising identifying an object reached for in the new reaching task.
 6. The computer-implemented method of claim 1, further comprising instructing a robot based on the estimated goal location.
 7. A learning system configured for being trained to predict intentions of reaching tasks of an arm of a user, wherein the learning system controls a robot to avoid collisions and redundant actions with the person, wherein the learning system includes a training unit and an on-line learning unit, wherein the training unit trains a neural network (NN) based on training data derived from one or more user demonstrations of reaching tasks; and wherein the online learning unit predicts intentions for new reaching tasks and updates the NN in real time based on test data derived from the reaching tasks, and the system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions comprising: recording, with a camera of the learning system, one or more demonstrations of the user to capture data z_(t) in three-dimensions, wherein each demonstration represents the user performing one or more reaching tasks via their arm, and wherein the captured data z_(t) represents positions in three dimensions of arm joints of the user, including shoulder, elbow, wrist, and palm, to describe overall movement of the arm at a time step t, wherein the arm joint positions are recorded in a camera reference frame corresponding to a point in a reference frame of the robot; deriving, from the measurements obtained from the captured data z_(t) at the time step t, a state x_(t), representing estimated positions, velocities and accelerations of the arm joints, wherein the state x_(t) depends on both a previous value of the state, at a previous time step t, and an estimated reaching intention g for the arm joints of the user that define a new reaching task, and wherein the arm joint positions, velocities, and accelerations are estimated by applying a Kalman filter to the recorded data; associating each of the one or more demonstrations with corresponding recorded goal locations defined by a culmination of the reaching task, and training the NN based on the one or more demonstrations; computing training data describing the one or more demonstrations from the estimates of the joint positions and the recorded goal locations; learning one or more weights of the neural network based on the computed training data, the neural network being configured to derive an estimated goal location of the one or more reaching tasks, wherein the one or more weights make inferences that estimate the reaching intention g based on starting arm positions and motion trajectories; recording a partial trajectory of a new reaching task; and computing, by a computer processor, the estimated goal location, defining the estimated reaching intention g, of the new reaching task by applying the neural network to the partial trajectory of the new reaching task; wherein computing the estimated goal location comprises solving a parameter inference problem by way of an expectation-maximization (EM) algorithm, wherein the EM algorithm utilizes a nonlinear state transition model, representing a positional state of the robot over time, that is discretized using first-order Euler approximation, and wherein the learning system computes an error in the state x_(t), based on the state estimate and measurement of the state, and wherein the error is utilized to update the NN weights at a next time step.
 8. The system of claim 7, the computer readable instructions further comprising executing an online learning algorithm to update the one or more weights of the neural network based at least in part on the estimated goal location of the new reaching task.
 9. The system of claim 8, wherein the online learning algorithm is configured to adapt the neural network to varying initial positions of an arm performing one or more future reaching tasks.
 10. The system of claim 7, wherein computing the estimated goal location comprises iteratively calculating the estimated goal location based on the neural network and a previous calculation of the estimated goal location.
 11. The system of claim 7, the computer readable instructions further comprising identifying an object reached for in the new reaching task.
 12. The system of claim 7, the computer readable instructions further comprising instructing a robot based on the estimated goal location. 